The limits of squared Euclidean distance regularization

نویسندگان

Michal Derezinski

Manfred K. Warmuth

چکیده

Some of the simplest loss functions considered in Machine Learning are the square loss, the logistic loss and the hinge loss. The most common family of algorithms, including Gradient Descent (GD) with and without Weight Decay, always predict with a linear combination of the past instances. We give a random construction for sets of examples where the target linear weight vector is trivial to learn but any algorithm from the above family is drastically sub-optimal. Our lower bound on the latter algorithms holds even if the algorithms are enhanced with an arbitrary kernel function. This type of result was known for the square loss. However, we develop new techniques that let us prove such hardness results for any loss function satisfying some minimal requirements on the loss function (including the three listed above). We also show that algorithms that regularize with the squared Euclidean distance are easily confused by random features. Finally, we conclude by discussing related open problems regarding feed forward neural networks. We conjecture that our hardness results hold for any training algorithm that is based on the squared Euclidean distance regularization (i.e. Back-propagation with the Weight Decay heuristic).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra

Hyperspectral image containing high spectral information has a large number of narrow spectral bands over a continuous spectral range. This allows the identification and recognition of materials and objects based on the comparison of the spectral reflectance of each of them in different wavelengths. Hence, hyperspectral image in the generation of land cover maps can be very efficient. In the hy...

متن کامل

Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms

In this paper we discus a wide class of loss (cost) functions for non-negative matrix factorization (NMF) and derive several novel algorithms with improved efficiency and robustness to noise and outliers. We review several approaches which allow us to obtain generalized forms of multiplicative NMF algorithms and unify some existing algorithms. We give also the flexible and relaxed form of the N...

متن کامل

Convex Inverse Scale Spaces

Inverse scale space methods are derived as asymptotic limits of iterative regularization methods. They have proven to be efficient methods for denoising of gray valued images and for the evaluation of unbounded operators. In the beginning, inverse scale space methods have been derived from iterative regularization methods with squared Hilbert norm regularization terms, and later this concept wa...

متن کامل

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

We propose BinaryRelax, a simple two-phase algorithm, for training deep neural networks with quantized weights. The set constraint that characterizes the quantization of weights is not imposed until the late stage of training, and a sequence of pseudo quantized weights is maintained. Specifically, we relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to...

متن کامل

Regularization by Early Stopping in Single Layer Perceptron Training

Adaptative training of the non-linear single-layer perceptron can lead to the Euclidean distance classifier and later to the standard Fisher linear discriminant function. On the way between these two classifiers one has a regularized discriminant analysis. That is equivalent to the “weight decay” regularization term added to the cost function. Thus early stopping plays a role of regularization ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

The limits of squared Euclidean distance regularization

نویسندگان

چکیده

منابع مشابه

Improvement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra

Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms

Convex Inverse Scale Spaces

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

Regularization by Early Stopping in Single Layer Perceptron Training

عنوان ژورنال:

اشتراک گذاری